33 research outputs found
Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search
The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a
general technique for constructing a data structure to answer approximate near
neighbor queries by using a distribution over locality-sensitive
hash functions that partition space. For a collection of points, after
preprocessing, the query time is dominated by evaluations
of hash functions from and hash table lookups and
distance computations where is determined by the
locality-sensitivity properties of . It follows from a recent
result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive
hash functions can be reduced to , leaving the query time to be
dominated by distance computations and
additional word-RAM operations. We state this result as a general framework and
provide a simpler analysis showing that the number of lookups and distance
computations closely match the Indyk-Motwani framework, making it a viable
replacement in practice. Using ideas from another locality-sensitive hashing
framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of
additional word-RAM operations to .Comment: 15 pages, 3 figure
Algorithms for Stable Matching and Clustering in a Grid
We study a discrete version of a geometric stable marriage problem originally
proposed in a continuous setting by Hoffman, Holroyd, and Peres, in which
points in the plane are stably matched to cluster centers, as prioritized by
their distances, so that each cluster center is apportioned a set of points of
equal area. We show that, for a discretization of the problem to an
grid of pixels with centers, the problem can be solved in time , and we experiment with two slower but more practical algorithms and
a hybrid method that switches from one of these algorithms to the other to gain
greater efficiency than either algorithm alone. We also show how to combine
geometric stable matchings with a -means clustering algorithm, so as to
provide a geometric political-districting algorithm that views distance in
economic terms, and we experiment with weighted versions of stable -means in
order to improve the connectivity of the resulting clusters.Comment: 23 pages, 12 figures. To appear (without the appendices) at the 18th
International Workshop on Combinatorial Image Analysis, June 19-21, 2017,
Plovdiv, Bulgari
Computing the Fréchet Distance with a Retractable Leash
All known algorithms for the Fréchet distance between curves proceed in two steps: first, they construct an efficient oracle for the decision version; second, they use this oracle to find the optimum from a finite set of critical values. We present a novel approach that avoids the detour through the decision version. This gives the first quadratic time algorithm for the Fréchet distance between polygonal curves in (Formula presented.) under polyhedral distance functions (e.g., (Formula presented.) and (Formula presented.)). We also get a (Formula presented.)-approximation of the Fréchet distance under the Euclidean metric, in quadratic time for any fixed (Formula presented.). For the exact Euclidean case, our framework currently yields an algorithm with running time (Formula presented.). However, we conjecture that it may eventually lead to a faster exact algorithm
SeqAn An efficient, generic C++ library for sequence analysis
<p>Abstract</p> <p>Background</p> <p>The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.</p> <p>Results</p> <p>To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.</p> <p>Conclusion</p> <p>We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.</p
DYNAMIC PARTITION TREES
In this paper we study dynamic variants of conjugation trees and related structures that have recently been introduced for performing various types of queries on sets of points and line segments, like half-planar range searching, shooting, intersection queries, etc. For most of these types of queries dynamic structures are obtained with an amortized update time of O(log2 n) (or less) with only minor increases in query times. As an application of the method we obtain an output-sensitive method for hidden surface removal in a set of n triangles that runs in time 0(nlog n + n.kg-gamma) where gamma = log2 ((1 + square-root 5)/2) almost-equal-to 0.695 and k is the size of the visibility map obtained
Hidden surface removal for axis-parallel polyhedra
An efficient, output-sensitive method for computing the visibility map of a set of axis-parallel polyhedra (i.e. polyhedra with their faces and edges parallel to the coordinate axes) as seen from a given viewpoint is introduced. For nonintersecting polyhedra with n edges in total, the algorithm runs in time O((n+k)log n), where k is the complexity of the visibility map. The method can handle cyclic overlap of the polyhedra and perspective views without any problem. For c-oriented polyhedra (with faces and edges in c orientations, for some constant c) the method can be extended to run in the same time bound. The method can be extended even further to deal with intersecting polyhedra with only a slight increase in the time bound